UPSTREAM PR #17289: convert : remove unnecessary chat template patching by DajanaV · Pull Request #221 · auroralabs-loci/llama.cpp

DajanaV · 2025-11-15T18:38:58Z

Remove chat template patching that is no longer necessary:

loci-review · 2025-11-15T19:14:40Z

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Based on the comprehensive analysis of project_id=2621b8c0-b5ce-11f0-b333-453f42058aa1 comparing version d92759bb-fa39-48d2-8e30-324d7703c52c against baseline 032e8f46-edb9-425d-b2b4-b3ae82d31e9b, the changes show minimal performance impact with no meaningful functional modifications.

Performance Metrics Overview

The analysis reveals negligible performance variations:

Highest Response Time Change: can_reuse function (+0.096%, +0.063 ns absolute)
Highest Throughput Change: llm_ffn_exps_block_regex function (+0.153%, +0.153 ns absolute)
Power Consumption: No significant changes across all binaries (<0.001% variation)

Technical Analysis

Function-Level Insights: Both functions showing the highest percentage changes were unmodified between versions, indicating the variations represent measurement noise rather than code changes. The can_reuse function operates as a simple computational unit with 65 ns execution time, while llm_ffn_exps_block_regex handles regex processing with minimal self-execution overhead.

CFG Comparison: The control flow graphs for can_reuse are identical between versions, confirming no structural or assembly-level changes. The 0.06 ns timing difference stems from environmental factors rather than code modifications.

GitHub Code Review: The associated PR #221 removes unnecessary chat template patching in Python conversion scripts, affecting only the model conversion process without impacting runtime inference performance.

Impact Assessment

Core Function Impact: None of the critical inference functions (llama_decode, llama_encode, llama_tokenize) show performance changes, indicating no impact on tokens per second throughput.

Power Efficiency: All binaries maintain consistent power consumption profiles with variations below measurement precision.

Overall Assessment: The sub-nanosecond timing variations are within normal system noise levels and do not represent functional regressions or performance concerns. The changes reflect successful removal of conversion-time workarounds without affecting runtime performance.

remove unnecessary chat template patching

a3d2377

DajanaV temporarily deployed to PROD__AL_DEMO November 15, 2025 18:39 — with GitHub Actions Inactive

DajanaV force-pushed the main branch from 654fc56 to 35c840d Compare November 15, 2025 19:07

DajanaV force-pushed the main branch 17 times, most recently from f333350 to 9c4623f Compare November 18, 2025 09:10

loci-dev force-pushed the main branch 9 times, most recently from c9a7f98 to 833a99a Compare November 21, 2025 11:08

loci-dev force-pushed the main branch 24 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32

loci-dev force-pushed the main branch 6 times, most recently from 30ef9d0 to c824910 Compare February 17, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17289: convert : remove unnecessary chat template patching#221

UPSTREAM PR #17289: convert : remove unnecessary chat template patching#221
DajanaV wants to merge 1 commit intomainfrom
upstream-PR17289-branch_ggml-org-cisc/convert-no-chat-template-patching

DajanaV commented Nov 15, 2025

Uh oh!

loci-review bot commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DajanaV commented Nov 15, 2025

Uh oh!

loci-review bot commented Nov 15, 2025

Performance Analysis Summary

Performance Metrics Overview

Technical Analysis

Impact Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants